One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
نویسندگان
چکیده
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those struggle to learn a consistent speech style from different speakers. We observe that it would be much easier specific speaker, which leads authentic movements. Hence, we propose novel framework by exploring correlations between audio visual motions speaker then transferring audio-driven motion fields reference image. Specifically, develop an Audio-Visual Correlation Transformer (AVCT) aims infer represented keypoint based dense input audio. In particular, considering may come identities in deployment, incorporate phonemes represent signals. this manner, our AVCT can inherently generalize spoken other identities. Moreover, as keypoints used speakers, is agnostic against appearances the training thus allows us manipulate images readily. Considering lead motions, field transfer module exploited reduce gap identity reference. Once obtained image, employ image renderer generate its clip. Thanks learned speaking style, method generates vivid Extensive experiments demonstrate synthesized outperform state-of-the-art terms quality lip-sync.
منابع مشابه
Audio-visual talking face detection
Talking face detection is important for videoconferencing. However, the detection of the talking face is difficult because of the low resolution of the capturing devices, the informal style of communication and the background sounds. In this paper, we present a novel method for finding the talking face using latent semantic indexing approach. We tested our method on a comprehensive set of home ...
متن کاملLook Who's Talking: Speaker Detection using Video and Audio Correlation
The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned ...
متن کاملAn Audio-Visual Imposture Scenario by Talking Face Animation
With the start of the appearance of PDA’s, handheld PC’s, and mobile telephones that use biometric recognition for user authentication, there is higher demand for automatic non-intrusive voice and face speaker verification systems. Such systems can be embedded in mobile devices to allow biometrically recognized users to sign and send data electronically, and to give their telephone conversation...
متن کاملExploiting Audio-visual Correlation in Coding of Talking Head Sequences
TALKING HEAD SEQUENCES Ram R. Rao Georgia Institute of Technology Atlanta, GA 30332 [email protected] Tsuhan Chen AT&T Bell Laboratories Holmdel, NJ 07733 [email protected] ABSTRACT In this paper, we present a novel means for predicting the shape of a person's mouth from the corresponding speech signal and explore applications of this prediction to video coding. One possible application...
متن کاملOne shot learning of simple visual concepts
People can learn visual concepts from just one example, but it remains a mystery how this is accomplished. Many authors have proposed that transferred knowledge from more familiar concepts is a route to one shot learning, but what is the form of this abstract knowledge? One hypothesis is that the sharing of parts is core to one shot learning, and we evaluate this idea in the domain of handwritt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i3.20154